Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Skew detection and block classification of printed documents

Identifieur interne : 001A94 ( Main/Exploration ); précédent : 001A93; suivant : 001A95

Skew detection and block classification of printed documents

Auteurs : P.-Y Yin [Taïwan]

Source :

RBID : ISTEX:0ADB24F0F575C646E541FE3013EB8AEA58EBBBC6

English descriptors

Abstract

Since the number of daily-received paper-based office documents is overwhelming, the development of document image analysis, which converts the paper-based documents into electronic forms becomes increasingly important. This paper describes a skew detection method which first smoothes the black runs and locates the black–white transitions to emphasize the text lines. Then the skew angle is determined by an improved Hough transform. For the block classification step, a rule-based classifier is presented. The classification rules are derived from the gray level entropy, block aspect ratio, and run length analysis. To evaluate the performance of the proposed methods, a test set of 100 different documents is used. The results of the experiments reveal that all of the 100 documents are successfully skew-corrected and the precision rate and the recall rate of the proposed block classifier are satisfactory.

Url:
DOI: 10.1016/S0262-8856(00)00098-6


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Skew detection and block classification of printed documents</title>
<author>
<name sortKey="Yin, P Y" sort="Yin, P Y" uniqKey="Yin P" first="P.-Y" last="Yin">P.-Y Yin</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0ADB24F0F575C646E541FE3013EB8AEA58EBBBC6</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0262-8856(00)00098-6</idno>
<idno type="url">https://api.istex.fr/document/0ADB24F0F575C646E541FE3013EB8AEA58EBBBC6/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000A02</idno>
<idno type="wicri:Area/Istex/Curation">000990</idno>
<idno type="wicri:Area/Istex/Checkpoint">001148</idno>
<idno type="wicri:doubleKey">0262-8856:2001:Yin P:skew:detection:and</idno>
<idno type="wicri:Area/Main/Merge">001B87</idno>
<idno type="wicri:Area/Main/Curation">001A94</idno>
<idno type="wicri:Area/Main/Exploration">001A94</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Skew detection and block classification of printed documents</title>
<author>
<name sortKey="Yin, P Y" sort="Yin, P Y" uniqKey="Yin P" first="P.-Y" last="Yin">P.-Y Yin</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Information Management, Ming Chuan University, 5 Teh-Ming Road, Gwei Shan, Taoyuan County 333</wicri:regionArea>
<wicri:noRegion>Taoyuan County 333</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="567">567</biblScope>
<biblScope unit="page" to="579">579</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">0ADB24F0F575C646E541FE3013EB8AEA58EBBBC6</idno>
<idno type="DOI">10.1016/S0262-8856(00)00098-6</idno>
<idno type="PII">S0262-8856(00)00098-6</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Block classification</term>
<term>Document analysis</term>
<term>Hough transform</term>
<term>Skew detection</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Since the number of daily-received paper-based office documents is overwhelming, the development of document image analysis, which converts the paper-based documents into electronic forms becomes increasingly important. This paper describes a skew detection method which first smoothes the black runs and locates the black–white transitions to emphasize the text lines. Then the skew angle is determined by an improved Hough transform. For the block classification step, a rule-based classifier is presented. The classification rules are derived from the gray level entropy, block aspect ratio, and run length analysis. To evaluate the performance of the proposed methods, a test set of 100 different documents is used. The results of the experiments reveal that all of the 100 documents are successfully skew-corrected and the precision rate and the recall rate of the proposed block classifier are satisfactory.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Taïwan</li>
</country>
</list>
<tree>
<country name="Taïwan">
<noRegion>
<name sortKey="Yin, P Y" sort="Yin, P Y" uniqKey="Yin P" first="P.-Y" last="Yin">P.-Y Yin</name>
</noRegion>
<name sortKey="Yin, P Y" sort="Yin, P Y" uniqKey="Yin P" first="P.-Y" last="Yin">P.-Y Yin</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A94 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A94 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:0ADB24F0F575C646E541FE3013EB8AEA58EBBBC6
   |texte=   Skew detection and block classification of printed documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024